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© System for automatic generation of message parser. 

© A code generating system utilizes automatic 
compiler generators (109, 113), known as "compiler- 
compilers" to automatically generate message par- 
ser code for a message processing system (200). A 
format of an incoming message is treated as if it 
were the syntax of a computer language. The prob- 
lem of decoding such a message is then equivalent 
to the problem of decoding a statement expressed in 
higher order computer source code typically handled 
by a conventional compiler. 
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SYSTEM FOR AUTOMATIC GENERATION OF MESSAGE PARSER 



BACKGROUND OF THE INVENTION 



The invention relates to automatic computer 
code generation for the implementation of pro- 
grams for parsing input messages to a computer- 
ized message processing system. 

Computerized incoming message processing 
systems are known wherein each message must 
be "parsed" or separated into its component parts 
in a manner which will enable the programs to 
interpret the grammatical form, function and inter- 
relation of each part of the message. Convention- 
ally, message parsing programs are implemented 
by providing programmers with the legal message 
formats and syntax rules for a given message set 
and then manually coding the parsing procedures. 

Various problems are associated with using 
manual coding methods for programming such 
message parsers. The chief difficulty in developing 
code for the message decomposition is that a 
formatted message does not have a rigid, un- 
varying definition. For example, lines may be miss- 
ing or of different lengths, and fields may be ab- 
sent. Additionally, the parsing program must be 
able to consider ail conditions which could occur in 
a formatted message from the message set. Under 
this condition, a major problem is checking the 
parser routine for consistency and completeness. 
Portions of a manually generated parser may never 
be executed, and it is difficult to test and debug 
such a parser program. Hence the parsing routine 
may be placed into operation before many "bugs" 
or errors appear during its use. Another problem 
with prior art approaches is the difficulty in deter- 
mining the impact of changes to the parser pro- 
gram when a new or variant format is added to the 
message set. Finally, coding is labor-intensive and 
requires highly trained programmers. 

SUMMARY OF THE INVENTION 



The above problems can be minimized by us- 
ing automatic programming techniques to generate 
message parsing programming code. The format of 
a message to be processed is treated as the syn- 
tax of a high-order computer language. The prob- 
lem of decoding a message is then equivalent to 
the problem of decoding a higher order source 
code. 

Accordingly, a system for automatically gen- 
erating computer program code for use in parsing 
messages having a known specification of allowa- 
ble message formats and key words comprises a 



lexical analyzer generator for generating a lexical 
analyzer program source code in a predetermined 
high level computer code language, key word ex- 
amining means for passing a list of the key words 

5 along with the action to be taken when each key 
word is recognized to the lexical analyzer gener- 
ator. The source code of the lexical analyzer is 
passed to a first compiler means for the predeter- 
mined high level computer code language resulting 

10 in generation of an object code implementation of a 
lexical analyzer for the messages to be parsed. 
Production means generates a set of generation 
.rule productions which defines the allowable for- 
mats of the messages to be parsed in accordance 

75 with the known specification, and the set of produc- 
tions is passed to a compiler means for generating 
a compiler program for a high level computer code 
language. The compiler generator program is used 
to generate a syntax analyzer program source code 

20 in the predetermined computer code language and 
the syntax analyzer source code is passed to a 
second compiler means for the predetermined 
computer code language to generate an object 
code implementation of a syntax analyzer for the 

25 messages to be parsed. In this manner, the object 
code for both lexical analysis and syntactical analy- 
sis is in essence automatically generated thereby 
by-passing problems with the prior art in manually 
generating such parsing program code. 

30 By utilizing a system which converts the mes- 
sage format specification into an executable pro- 
gram, the message format specification is auto- 
matically checked for consistency and complete- 
ness. New or variant message formats can be 

35 added easily and the changes are automatically 
checked to see if they interfere with the previously 
working formats. After the new specification is pre- 
pared, a new parser can be generated in a manner 
of minutes with, little manual effort. 

40 

BRIEF DESCRIPTION OF THE DRAWING 



The objects and features of the invention will 
45 become apparent from a reading of a detailed 
description taken in conjunction with the drawing in 
which FIG. 1 is a message processing data flow 
diagram arranged in accordance with the principles 
of the invention. 

50 

DETAILED DESCRIPTION 



Computerized message processing systems 
basically automate the tasks performed in the past 
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by human intervention at a message receiving ter- 
minal. The message processing system receives a 
message, splits it into its component parts and 
places its pertinent contents into an updated data 
base and perhaps displays all or a portion of the 
message, etc. 

As explained above, this invention contem- 
plates a system for automatically generating the 
requisite computer program code for use in decod- 
ing automatically a message input to the message 
processing system by breaking it down into its 
component parts and processing these parts with 
suitable programmed routines for updating the sys- 
tem's database and displaying information dictated 
by the nature of the incoming message. 

The invention is directed to automatically gen- 
erating the computer code necessary for parsing or 
breaking down into a predetermined structure, the 
basic parts of the incoming message for computer- 
ized analysis. The specifics of the computerized 
analysis is not relevant to the invention. The analy- 
sis is a process that takes place after the parsing 
using the techniques of the invention for automati- 
cally generating the parsing program is accom- 
plished. 

It has been found that by treating the allowable 
formats of the messages as the syntax of a com- 
puter language, known programs for automatically 
generating compilers and lexical analyzers in the 
context of higher order computer language codes 
can be utilized for automatically generating the 
code required for parsing the messages input to 
the message processing system. 

As is well known in the computer system pro- 
gramming art, any "compiler" takes as its input a 
character string, performs "lexical analysis" on that 
character string to generate a string of basic parts 
or "tokens", parses the tokens to generate typically 
a tree-form structure which then is operated upon 
by a code generation routine to produce assembly 
code corresponding to the character string input to 
the system. The assembly code is then passed 
through a conventional assembler to generate an 
object code implementation of the routine specified 
by the input character string. In this invention, the 
same tasks typically undertaken by a conventional 
compiler are performed on input messages which 
are treated as if they were a higher order computer 
code being operated upon by a compiler. One 
description of the conceptual nature and further 
details of the compilation process in computer sys- 
tems is given in Robert M. Graham, Principles of 
Systems Programming , John Wiley & Sons, \ric. 
(1975). 

The system of the invention is best explained 
with reference to the message processing data flow 
diagram of FIG. 1 . The message format specifica- 
tion at 101 is taken either by a human programmer 



or by an automatic program routine which exam- 
ines the key words and the specification of the 
message formats at 103 and passes at 102 a list of 
key words to a commercially available lexical ana- 

s lyzer generator computer program at 105. One 
such lexical analyzer generator is known as LEX 
which is described in N.E. Lesk "LEX-A Lexical 
Analyzer Generator", Computer Science Technical 
Report No. 39, Bell Laboratories, Murray Hill, New 

10 Jersey. The LEX generator accepts a specification 
for strings of characters to be recognized and 
actions to be performed when each string is found. 
The generator then produces a program written in 
C which performs the specified actions. Because 

15 the program is written in C, it can be easily moved 
to other processors when necessary. 

The lexical analyzer C source code at 104 is 
then passed to a conventional C compiler at 109 
which in turn produces an object code implementa- 

20 tion of the lexical analyzer at 106 for lexical ana- 
lyzer 203 of message processor 200. Lexical ana- 
lyzer 203 is used in message processor 200 to 
recognize key words, time stamps, and field de- 
limiters. The analyzer also strips out blank fines. 

25 Using such a string preprocessor prior to syntax 
analysis results in a more efficient implementation 
of the message processor system. 

The next step in parser generation section 100 
of FIG. 1 is for the production and key word gener- 

30 ator 103 to convert the message format specifica- 
tion into a set of syntax-defining generation rules 
called "productions". In the preferred embodiment, 
the productions at 107 are in so-called Backus- 
Naur Form (BNF), a well-known notation for ex- 

35 pressing such productions. BNF productions are 
described in the above cited Principles of Systems 
Programming by Robert M. Graham. 

The BNF productions at 107 are then passed 
to a commercially available compiler generator rou- 

40 tine also known as a "Compiler-Compiler". In this 
embodiment the compiler generator 111 comprises 
YACC (Yet Another Compiler-Compiler) which is 
described in S.C. Johnson, "YACC: Yet Another 
Compiler-Compiler", Computer Science Technical 

45 Report No. 32, 1975, Bell Laboratories, Murray Hill, 
New Jersey 07974. YACC accepts a specification 
expressed in BNF productions and generates a C 
program which either parses or rejects a message. 
YACC 1 1 1 then generates a syntax analyzer source 

so code in C language at 108 which is passed to a 
conventional C compiler 113 for generation at 110 
of the object cod implementation of syntax ana- 
lyzer 205. 

The message specification input to YACC can 
55 be checked by machine for completeness and con- 
sistency. The steps of generating and checking the 
message specification from the message format 
would have to be performed even if YACC were not 
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used. However, using YACC will reveal design er- 
rors early on in development of the parser code. 
Once the specification is complete, YACC will gen- 
erate the parser with no further human intervention. 

FIG. 1 also depicts the data flow through the 
message processor portion 200. Typically, an 
input/output routine would place message char- 
acters in a journal file 201 as they are received. 
Input messages are sent one character at a time 
over path 202 to lexical analyzer 203 where the 
characters are grouped into "tokens" for passage 
at 204 to syntax analyzer 205. If characters cannot 
be recognized, they will not be grouped, but will be 
sent as single character tokens for passage at 204 
to syntax analyzer 205. If the message syntax is 
determined to be incorrect, the parsed message is 
sent at 208 to an error file 209 where the parsed 
message becomes a candidate for message cor- 
rection by being passed at 212 to conventional 
error correction routines 21 1 . If the message syntax 
is correct, the parsed message is sent at 206 to a 
conventional semantic analyzer 207 whereupon 
parsed messages at 210 are sent to a parsed 
message file 213. The semantic analyzer 207 ex- 
amines various fields for errors such as invalid 
parameters in various key word locations. The se- 
mantic analyzer 207 reformats items into a stan- 
dard for further processing. If those messages 
which were rejected for errors were able to be 
automatically corrected at error correction routine 
211, the corrected message is then passed in 
parsed format at 214 to the parsed message file 
213. 

The invention has been described with refer- 
ence to a description of a preferred embodiment, 
the details of which have been given for the sake of 
example only. 

Many alternative embodiments would likewise 
fall within the scope of this invention. For example, 
other higher level computer languages could be 
employed, such as FORTRAN, ADA and PASCAL, 
provided a suitable lexical analyzer generator and a 
compiler-compiler, each arranged to generate 
source code in the chosen alternative higher order 
computer language are employed in place of LEX 
105 and YACC 111, respectively, of FIG. 1. 

Yet another alternative arrangement falling 
within the scope and spirit of this invention would 
be to treat the format rules of messages to be 
processed as the syntax of a computer assembly 
language. Under this approach, one could select a 
suitable assembly language-based lexical analyzer 
generator in place of LEX 105, a conventional as- 
sembler in place of compiler 109, an assembly 
language-based syntax analyzer generator in place 
of YACC 111, and a conventional assembler in 
place of compiler 113. 

The invention is to be interpreted in accor- 



dance with the spirit and scope of the appended 
claims. 

Claims 

5 

1 . A system for automatically generating computer 
program code for use in parsing messages having 
a known specification of allowable formats and key 
words, the system including 

10 - a lexical analyzer generator (105) conventionally 
used with key words of a predetermined computer 
programming language for generating a lexical ana- 
lyzer program source code (104) in the predeter- 
mined language, 

is - a first object code converter for use with the 
predetermined language, 

- a syntax analyzer source code generator conve- 
niently used for generating a conversion program 
expressed in the predetermined language, and 

20 - a second object code converter for use with the 
predetermined language, 
characterized by: 

- key word examining means (103) for passing a 
list of the key words (102) along with the action to 

25 be taken when each key word is recognized to the 
lexical analyzer generator (105); 

- the first object code converter operative to re- 
ceive the source code (104) from the lexical ana- 
lyzer generator (105) and to generate an object 

30 code implementation (106) of a lexical analyzer 
(103) for use in analyzing the messages to be 
parsed; 

- production means (103) for generating a set of 
generation rule productions (107) defining allowable 

35 formats of the messages to be parsed in accor- 
dance with the known specifications; 

- the syntax analyzer source code generator oper- 
ative to receive the set of productions (107) and to 
generate a syntax analyzer program source code 

40 (108) expressed in the predetermined language; 
and 

- the second object code converter operative to 
receive the syntax analyzer program source code 
(108) and to generate (110) an object code im- 

45 plementation of a syntax analyzer (105) for use in 
analyzing the messages to be parsed. 

2. The system of claim 1, characterized in that the 
predetermined computer programming language 
comprises assembly language and the first and 

50 second object code converters each comprise an 
assembler for the predetermined assembly lan- 
guage. 

3. The system of claim 1 or 2, characterized in that 
the predetermined computer language comprises a 

55 high-level language and the first and second object 
code converters each comprise a compiler (109, 
113) for the predetermined high-level language. 

4. A system for automatically generating computer 
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program code for use in parsing messages having 
a know specification of allowable formats and key 
words, including 

- a lexical analyzer generator (105) for generating a 
lexical analyzer program source code (104) In a 5 
predetermined high-level computer code language, 
characterized by: 

- key word examining means (103) for passing a 
list of the key words (102) along with the action to 

be taken when each key work is recognized to the 10 
lexical analyzer generator (105); 

- first compiler means (109) for receiving the 
source code (104) from the lexical analyzer gener- 
ator (105) and for generating an object code im- 
plementation (106) of a lexical analyzer (103) for 75 
use in analyzing the messages to be parsed; 

- production means (103) for generating a set of 
generation rule productions (107) defining allowable 
formats of the messages to be parsed in accor- 
dance with the known specification; 20 

- compiler generator means (111) for receiving the 
set of productions (107) and for generating a syn- 
tax analyzer program source code (108) in the 
predetermined computer code language; and 

- second compiler means (109) for receiving the 25 
syntax analyzer program source code (104) and 
generating an object code implementation (106) of 

a syntax analyzer (203) for use in analyzing the 
" messages to be parsed. 

5. The system of any of claims 1 through 4, char- 30 
acterized by: 

- journal file means (201) containing messages to 
be parsed and operative to pass (202) message 
characters as input to the object code implementa- 
tion of the lexical analyzer (230); 35 
the lexical analyzer (203) operative to generate 
message tokens (204) and to pass the tokens to 

the object code implementation of a syntax ana- 
lyzer (205) for generation of a parsed message 
(206, 208). 40 

6. The system of any of claims 1 through 5, char- 
acterized in that the predetermined computer lan- 
guage is C. 

7. The system of any of claims 1 through 6, char- 
acterized in that the lexical analyzer generator 45 

(105) comprises LEX. 

8. The system of any of claims 1 through 7 t char- 
acterized in that the generation rule productions 

(106) are expressed in Backus-Naur form (BNF). 

9. The system of any of claims 3 through 8, char- 50 
acterized in that the compiler generator (111) com- 
prises YACC. 
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